Search CORE

6 research outputs found

Safeguarding Privacy Through Deep Learning Techniques

Author: Catelli Rosario
Publication venue
Publication date: 13/04/2021
Field of study

Over the last few years, there has been a growing need to meet minimum security and privacy requirements. Both public and private companies have had to comply with increasingly stringent standards, such as the ISO 27000 family of standards, or the various laws governing the management of personal data. The huge amount of data to be managed has required a huge effort from the employees who, in the absence of automatic techniques, have had to work tirelessly to achieve the certification objectives. Unfortunately, due to the delicate information contained in the documentation relating to these problems, it is difficult if not impossible to obtain material for research and study purposes on which to experiment new ideas and techniques aimed at automating processes, perhaps exploiting what is in ferment in the scientific community and linked to the fields of ontologies and artificial intelligence for data management. In order to bypass this problem, it was decided to examine data related to the medical world, which, especially for important reasons related to the health of individuals, have gradually become more and more freely accessible over time, without affecting the generality of the proposed methods, which can be reapplied to the most diverse fields in which there is a need to manage privacy-sensitive information

Università degli Studi di Napoli Federico Il Open Archive

A New Italian Cultural Heritage Data Set: Detecting Fake Reviews With BERT and ELECTRA Leveraging the Sentiment

Author: Catelli Rosario
Fujita Hamido
Publication venue: IEEE
Publication date: 18/05/2023
Field of study

Consiglio Nazionale delle Ricerche-CARI-CARE-ITALY’ within the CRUI CARE Agreemen

Repositorio Institucional Universidad de Granada

Lexicon-Based vs. Bert-Based Sentiment Analysis: A Comparative Study in Italian

Author: Massimo Esposito
Rosario Catelli
Serena Pelosi
Publication venue: 'MDPI AG'
Publication date: 26/01/2022
Field of study

Recent evolutions in the e-commerce market have led to an increasing importance attributed by consumers to product reviews made by third parties before proceeding to purchase. The industry, in order to improve the offer intercepting the discontent of consumers, has placed increasing attention towards systems able to identify the sentiment expressed by buyers, whether positive or negative. From a technological point of view, the literature in recent years has seen the development of two types of methodologies: those based on lexicons and those based on machine and deep learning techniques. This study proposes a comparison between these technologies in the Italian market, one of the largest in the world, exploiting an ad hoc dataset: scientific evidence generally shows the superiority of language models such as BERT built on deep neural networks, but it opens several considerations on the effectiveness and improvement of these solutions when compared to those based on lexicons in the presence of datasets of reduced size such as the one under study, a common condition for languages other than English or Chinese

Multidisciplinary Digital Publishing Institute

An Effective BERT-Based Pipeline for Twitter Sentiment Analysis: A Case Study in Italian

Author: Marco Pota
Massimo Esposito
Mirko Ventura
Rosario Catelli
Publication venue: 'MDPI AG'
Publication date: 28/12/2020
Field of study

Over the last decade industrial and academic communities have increased their focus on sentiment analysis techniques, especially applied to tweets. State-of-the-art results have been recently achieved using language models trained from scratch on corpora made up exclusively of tweets, in order to better handle the Twitter jargon. This work aims to introduce a different approach for Twitter sentiment analysis based on two steps. Firstly, the tweet jargon, including emojis and emoticons, is transformed into plain text, exploiting procedures that are language-independent or easily applicable to different languages. Secondly, the resulting tweets are classified using the language model BERT, but pre-trained on plain text, instead of tweets, for two reasons: (1) pre-trained models on plain text are easily available in many languages, avoiding resource- and time-consuming model training directly on tweets from scratch; (2) available plain text corpora are larger than tweet-only ones, therefore allowing better performance. A case study describing the application of the approach to Italian is presented, with a comparison with other Italian existing solutions. The results obtained show the effectiveness of the approach and indicate that, thanks to its general basis from a methodological perspective, it can also be promising for other languages

Multidisciplinary Digital Publishing Institute

A Novel COVID-19 Data Set and an Effective Deep Learning Approach for the De-Identification of Italian Medical Records

Author: Francesco Gargiulo
Giuseppe De Pietro
Hamido Fujita
Massimo Esposito
Rosario Catelli
Valentina Casola
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

In the last years, the need to de-identify privacy-sensitive information within Electronic Health Records (EHRs) has become increasingly felt and extremely relevant to encourage the sharing and publication of their content in accordance with the restrictions imposed by both national and supranational privacy authorities. In the field of Natural Language Processing (NLP), several deep learning techniques for Named Entity Recognition (NER) have been applied to face this issue, significantly improving the effectiveness in identifying sensitive information in EHRs written in English. However, the lack of data sets in other languages has strongly limited their applicability and performance evaluation. To this aim, a new de-identification data set in Italian has been developed in this work, starting from the 115 COVID-19 EHRs provided by the Italian Society of Radiology (SIRM): 65 were used for training and development, the remaining 50 were used for testing. The data set was labelled following the guidelines of the i2b2 2014 de-identification track. As additional contribution, combined with the best performing Bi-LSTM + CRF sequence labeling architecture, a stacked word representation form, not yet experimented for the Italian clinical de-identification scenario, has been tested, based both on a contextualized linguistic model to manage word polysemy and its morpho-syntactic variations and on sub-word embeddings to better capture latent syntactic and semantic similarities. Finally, other cutting-edge approaches were compared with the proposed model, which achieved the best performance highlighting the goodness of the promoted approach

Directory of Open Access Journals

Characterization and antimicrobial resistance analysis of avian pathogenic Escherichia coli isolated from Italian turkey flocks

Author: AAVV (Autori Vari)
Allan
Altekruse
Barnes
Blanco
Caterina Lupini
Circella
D'Incau
Davide Giovanardi
Dho-Moulin
Elena Catelli
Giovanardi
Giovanni Ortali
Giulia Rossi
Gosling
Hemsley
Janben
Johnson
Maurer
Nairn
Olsen
Patrizia Pesente
Petersen
Rosario
Russo
Singer
Sojka
Trampel
Van den Hurk
White
Yang
Zanella
Zhao
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Crossref